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Abstract 

We propose a bootstrap-based robust high-confidence level upper bound (Robust 
H-CLUB) for assessing the risks of large portfolios. The proposed approach exploits 
rank-based and quantile-based estimators, and can be viewed as a robust extension of 
the H-CLUB method (Fan et ah, 2015). Such an extension allows us to handle possi¬ 
bly misspecified models and heavy-tailed data. Under mixing conditions, we analyze 
the proposed approach and demonstrate its advantage over the H-CLUB. We further 
provide thorough numerical results to back up the developed theory. We also apply 
the proposed method to analyze a stock market dataset. 

Keywords: High dimensionality; robust inference; rank statistics; quantile statistics; risk 
management; covariance matrix. 

1 Introduction 

Let Ri ,..., Rt be a stationary multivariate time series with Rt G representing the asset 
returns at time t. Letting w G be a portfolio allocation vector, we dehne the risk of w as 

Risk(w) ;= (Var(w^i?i))^/^ = 
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where S denotes the unknown volatility (or covariance) matrix of Rt- i.e., 

Assessing the risk of a portfolio includes two steps: First, we need a covariance matrix 
estimator Sgst; Secondly, we construct a conhdence interval for w'''Sw based on Sgst- 

Assessing the risk Risk(w) is challenging when d is large. For example, given a pool 
of 2,000 candidate assets, the volatility matrix S involves more than 2 million parameters. 
However, for daily returns data, the sample size is in general no larger than 500 over one and 
a half years. This is a typical “small n, large d" problem which leads to the accumulation 
of estimation errors (Jagannathan and Ma, 2003; Pesaran and Zaffaroni, 2008; Fan et ah, 
2012). To handle the curse of dimensionality, more structural regularization is imposed 
in estimating S. For example. Fan et ah (2008) and Fan et ah (2013) impose the factor 
model structure on the covariance matrix. The assumed factor structure reduces the effective 
number of parameters that have to be estimated. In addition, Ledoit and Wolf (2003) propose 
a shrinkage estimator of S. Moreover, Barndorff-Nielsen (2002), Zhang et ah (2005), and Fan 
et ah (2012) consider estimating S based on high-frequency data. Other literature includes 
Chang and Tsay (2010), Gomez and Gallon (2011), Lai et ah (2011), Fan et ah (2011), Bai 
and Liao (2012), and Fryzlewicz (2013). 

However, most of these papers focus on risk estimation instead of uncertainty assessment. 
To construct a conhdence interval for w"''Sw, Fan et ah (2012) propose to use ||w||^||Sest — 
S||max^ as an upper bound of |w'''(Sest —S)w|. However, this bound depends on the unknown 
S and has proven to be overly conservative in numerical studies. To handle this problem. 
Fan et ah (2015) further exploit several sample covariance based estimators Sgst of S and 
propose a high-conhdence level upper bound (H-CLUB) of |w'''(Sest — S)w|; For a given 
conhdence level 1 — 7 , under certain moment and dependence assumptions on the time series, 
the derived H-CLUB proves to dominate |w'''(Sgst — S)w| with probability approximating 
1 — 7 as both T and d increase to inhnity. 

This paper proposes new methods for uncertainty assessment of risks of large portfolios 
for high dimensional heavy-tailed data. In particular, we derive conhdence intervals for 
w"''Sw when the asset returns Ri,...,Rt are elliptically distributed. This setting has 
been commonly adopted in hnancial econometrics (Cont, 2001). To handle heavy-tailed 
data, we propose a new risk uncertainty assessment method named robust high-conhdence 
level upper bound (Robust H-CLUB). The Robust H-CLUB exploits a new block-bootstrap- 
based approach for uncertainty assessment of Risk(w). More specihcally, we decompose the 
problem of assessing the risk w'''Sw into two parts: (i) We propose a robust estimator Sgst 

^We will provide the definitions of the vector £i norm (|| • ||i) and matrix ^max norm || • Umax later. 
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of S; (ii) We derive the variance of w'''(Sest — S)w. For estimating S, we exploit rank- 
based Kendall’s tan estimators and qnantile-based median absolnte deviation estimators. For 
estimating the variance of w'''(Sest ~ we employ the circnlar block bootstrap method 

(Politis and Romano, 1992). 

Theoretically, when T, d —)■ cx) and d is possibly mnch larger than T, we develop an 
inferential theory of the robnst risk estimators. In particular, we show that A/T'w'''(Sest—S)w 
is asymptotically normal with variance and the block-bootstrap-based estimator of 
is consistent. The theory holds even when d is nearly exponentially larger than T. Moreover, 
it holds under any elliptical model. Thus we no longer need strong moment conditions (e.g., 
exponentially decaying rate on the tails of distributions) on the asset returns. 

1.1 Other Related Work 

There is a vast literature on estimating large sparse/factor-based covariance matrices. Under 
the assumption that data points are mutually independent, many sample covariance based 
regularization methods, including banding (Bickel and Levina, 2008b), tapering (Cai et ah, 
2010), thresholding (Bickel and Levina, 2008a; Cai and Zhou, 2012), and factor structures 
(Fan et ah, 2008; Agarwal et ah, 2012; Hsu et ah, 2011), have been proposed. They are 
further applied to study stationary time series data under vector autoregressive dependence 
(Loh and Wainwright, 2012; Han and Liu, 2013c), mixing conditions (Pan and Yao, 2008; 
Fan et al., 2011, 2013; Han and Liu, 2013b), and physical dependence (Xiao and Wu, 2012; 
Chen et ah, 2013). 

This paper is also related to the literature on estimating large correlation/covariance 
matrix under the misspecified or heavy-tailed model. For example, Han and Liu (2014b), 
Han and Liu (2013a), Wegkamp and Zhao (2013), Mitra and Zhang (2014), and Fan et al. 
(2014) exploit the rank statistics, while Qiu et al. (2014) focus on quantile statistics. None 
of these works study the risk inference problem as in our paper. 

1.2 Notation 

Let V = (ui,..., Vd)'^ be a d dimensional real vector and M = [Mjk] be a d by d real matrix. 
For 0 < g < cxo, let the vector iq norm be ||v||g := (X]j=i the vector £oo norm 

be ||v||oo := maxj^]^ |nj|. For two subsets I,J G {1,... ,d}, we denote v/ and M/_j as the 
sub-vector of v with entries indexed by I and sub-matrix of M with rows and columns 
indexed by / and J. We denote the matrix £max norm of M as ||M||max := max^fc \Mjk\. 
Letting N = [Njk] G be another d by d real matrix, we denote by M o N = [MjkNjk] 
the Hadamard product between M and N. Letting / : M —)■ M be a real function, we 
denote by /(M) = [f{Mjk)] the matrix with f{Mjk) as its {j,k) entry. We write M = 
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diag(Mi,..., Mfc) if M is block diagonal with diagonal matrices Mi,..., M^. For random 
vectors X ,Y E we write X = Y ii X and Y are identically distributed. Throughout the 
paper, we use c, Ci, C 2 ,..., and C, (Fi, (^ 2 ,... to represent generic absolute positive constants, 
for which the actual values may change at from one line to another. For any real positive 
sequences {a„} and {&n}, we write a„ > bn if we have a„ > cbn for some absolute constant 
c and all large enough n. We write an ^ bn if we have bn ^ Un, and a„ x bn if an < bn and 
o-n^bn- For a G M, we dehne [a] and [aj to be the smallest integer larger than a and the 
largest integer smaller than a respectively. 

1.3 Paper Organization 

The rest of this paper is organized as follows. Section 2 introduces the Robust H-CLUB 
estimator for assessing the uncertainty of the portfolio risk. We consider three settings: (i) 
The marginal variances of the returns are known; (ii) The marginal variances are unknown, 
but with additional information for helping determine the values; (hi) The marginal vari¬ 
ances are unknown and there is no additional information available. Section 3 presents the 
inferential theory for the risk estimators and justihes the use of Robust H-CLUB. Sections 4 
and 5 present synthetic and real data analyses to back up the developed theory. Section 6 
summarizes the results and discusses future work. Section 7 presents all the proofs. 

2 Robust H-CLUB 

This section introduces the Robust H-CLUB method. We consider a multivariate time 
series of asset returns i?i,..., Rt with Rt = {Rti, • • •, RtdY ^ for t = 1,..., T. Let 
S := Cov{Rt) be the covariance matrix and D G be a diagonal matrix with diagonals 
1 ■ ■ ■ 1 easy to derive S = DS°D, where is the correlation matrix of Rt. 

For a given portfolio allocation vector w G M'^, we aim to construct a conhdence interval for 
w'^'Sw. Throughout this section, our interest is on analyzing heavy-tailed returns, which 
are common in hnancial applications. 

We exploit the elliptical distribution family to model heavy-tailed data. The ellipti¬ 
cal distribution is routinely used in modeling hnancial data (Owen and Rabinovitch, 1983; 
Hamada and Valdez, 2004; Frahm and Jaekel, 2007). More specihcally, a random vector 
Z eW^ follows an elliptical distribution with mean G and positive dehnite covariance 
matrix S G if 

Z^^i + iAU, 

where A G satishes AA"'" = S, G is uniformly distributed on the d-dimensional 
sphere and ^ is an unspecihed nonnegative random variable independent of U satisfying 


4 


= d. We impose the following stationary assnmption on 

• ("^0). -Ri,..., Rt are continnous and identically distribnted as an elliptical random 
vector R with covariance and correlation matrices S and 


For parameter estimation, we dehne the rank-based Kendall’s tan correlation coefficient 
and quantile-based median absolute deviation estimators. In detail, given Ri,, Rt, the 
sample and population Kendall’s tan matrices T = [rjk] and T = [rjk] are dehned as 

rjk ■■= ^siga{Rtj - Rq)sign(Rfc - R/fc), 

Tjk := Esign(Rj - Rj)sign{Rk - Rk), (2.1) 

where R = (i?i,..., RdY and R = (i?i,..., RdY are two independent copies of Ri. Under 
the elliptical model, the Kendall’s tan matrix T and correlation matrix satisfy (Lindskog 
et ah, 2003): 

= sin(^^rjfc). (2.2) 

Next, we dehne the quantile-based median absolute deviation estimator of the scale 
parameter. We start with some extra notation. Let X G M be a random variable and 
{Xi,...,Xt} be T realizations of X. For any q G [0,1], we dehne the population and 
sample g-quantiles as 


Q{X-, q) := inf {x : P(X < x) > q], 
Q{{Xt}',q) := X^^\ where k = min|f : ^ > g|. 


(2.3) 


Here X^^'^ < < • • • < X^^^ are the ordered sequence of Xi,... ,XY. We then dehne 

the population and sample median absolute deviations for {Xi,... ,Xt} as the population 
and sample medians of absolute values of the centered data. The formal dehnitions are as 
follows: 


aM(X) := Q 

ffM({.V}Li) - Q 


.Y-Q(.Y;-)|}; 

X,-Q({X,}tiT 


T 

t=l 



(2.4) 


They are robust alternatives to the population and sample standard deviations. In particular, 
for an elliptically distributed random vector R = (i?i,..., RdY) Han et ah (2014) prove that 


O'm(Ri) _ CrM{R2) _ _ <rM{Rd) p-x 

sd(Ri) sd(R 2 ) sd{Rd) ’ 

^Let F and / be the distribution function and density function of X. We will use Q{X-,q),Q{F;q), and 
Q{f',q) exchangeable 
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where for arbitrary random variable X, sd(X) represents the standard deviation of X. 

Under the elliptical model and using the rank- and quantile-based estimators, we propose 
three robust approaches to construct the conhdence interval of w'''Sw. Formally speaking, 
for each proposed robust covariance matrix estimator Sest and any given 7 > 0 , we aim to 
hnd a Uesti'l) such that 

Pj^w'^Sw G [w'^SestW - Uest( 7 ), w'^SestW Uest(7)]) 1 “ 7, 

as r, d —)■ oo. The proposed approaches correspond to three scenarios where D has different 
structures. 

Of note, a main strategy throughout the proposed three methods is to separately estimate 
the marginal standard deviations and bivariate correlation coefficients. In this paper, we 
focus on measuring the uncertainty introduced in estimating the correlation coefficients, 
while assuming that the uncertainty introduced in estimating marginal standard deviations 
is negligible^. For measuring the uncertainty in correlation coefficients estimation, we employ 
a circular block bootstrap method. 

In detail, suppose that we derive robust marginal standard deviation estimator Dest of 
D. We further derive the correlation matrix estimator of based on a d-dimensional 
multivariate time series Xi ,..., Xt. For any given portfolio allocation vector w, we propose 
to estimate w'''Sw by 

Risk(w) := w'^SgstW, where Sest := DestS^^tDest- (2.6) 

To estimate the asymptotic variance of the estimator w'''SestW, we adopt a circular block 
bootstrap procedure introduced in Politis and Romano (1992). First, we extend the sample 
Xi ,..., Xrp periodically by concatenating Xj+'r = Xi for i > 1. We then randomly select 
a block of / = /t X consecutive observations from the extended sample for some 

absolute constant eo < 1 (e.g., we can pick eo to be 0.9). As the hnancial time series admits 
weakly dependence structure, the choice of block size I is not very important. We repeat 
this process b = \T/l\ times independently to obtain a sample ..., X^, so that for each 
k = 0,... ,b — 1, 

P- = Xj,..., xy= X,+,_,) = l/T, for i = 1,..., r, 

where P* is the resampling distribution conditional on Xi,..., Xt- Based on each re¬ 
sampled time series XJ‘,...,X|,, we calculate the correlation matrix estimator Se*t. Let 
:= DestSgg^Dest be the estimator of S based on the resampled data and Var*(-) be the 

^This is mainly for the purpose of constructing the bootstrap-based inferential theory. 


6 



variance operator of the probability mass function P*. We estimate the asymptotic variance 
of w’'’SestW by 

S^est := Var*(yTw^S*,tw). 

2.1 Known Marginal Volatilities 

In this section we consider the setting where the marginal standard deviations of Rt, encoded 
in D, are known. While this is an ideal assumption, a practical implementation is to £t a 
parametric model such as the GARCH(1,1) model introduced in Bollerslev (1986) to each 
individual return time series. Such estimates are much more accurate than the nonparametric 
ones and can be ideally treated as known. 

When D is known, estimating w'''Sw reduces to estimating the correlation matrix S°. 
Using (2.2), under the elliptical model, we focus on the covariance matrix estimator S with 
S := Dsin^ 7 rT/ 2 ^D. We then estimate w'^'Sw via replacing Sgst by S in (2.6). Let be 

an estimator of the asymptotic variance of w'''Sw. We calculate based on the circular 
block bootstrap method introduced earlier. Let <!)(■) be the cumulative distribution function 
of a standard Gaussian random variable. For any given conhdence level 1 — 7 G (0,1), we 
dehne the Robust H-CLUB estimator U ( 7 ) as 

t/(7) :=$-'(!-7/2)7^. (2.7) 

The corresponding conhdence interval for the risk is 

[w'''Sw — 17(7), w'''Sw + 1/(7)]. (2.8) 

In Section 3 we will show that, under mild conditions, 

= cT^(l + op(l)) and p||w'''(S — S)w| < Lh( 7 )| —1 — 7 , 

as T and d go to inhnity. Therefore [w'''Sw —[7(7), w'''Sw+t/(7)] is a valid level ( 1 — 7 ) 100 % 
interval covering the true w^Sw. 

2.2 Additional Data 

This section considers the setting that there are available historical data for estimating D. 
To adapt to the current market condition, we usually pick a short time series such that the 
asset returns are approximately stationary. However, it is likely that each univariate time 
series is stationary over a longer time scale than the multivariate time series, and hence we 
can incorporate extra information into calculation of the marginal standard deviations. 
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Inspired by this, we consider a setting where historical information is available. We do not 
assnme the historical data to be mnltivariately stationary, bnt only marginally stationary. 
Formally speaking, let Ri ,..., Rt be the observed stationary mnltivariate time series, and 
Hi ,..., be the available historical data with Ht = {Hu ,..., HtdY and 

T = 0{TI~^), where 5 is an absolnte constant. (2.9) 


Hi ,..., Hxt^ conld have overlap with Ri,..., Rt- However, Ht is not necessarily identically 
distribnted to either Ht' or Ri for any t f E {1,... ,Th}. Instead, we only assnme that 

Hij = H 2 j = --- = and Var(hfij) = Var(i?y), for j e {1,..., d}. 


We then estimate w'''Sw by separately estimating D and 
Formally, for estimating D, we nse the historical data Hi,. 


, Hxf,^ and derive 




(D^„...,D" 


ddJ: 


where := 


o- 


( 2 . 10 ) 


M,1 


and for j = 1 ,... ,d, is the median absolnte deviation estimator of 

{Htj}[^i, anda(^ = (Var({Hti}f^i)) is the Pearson sample standard deviation of {Htij^i- 
For estimating S°, we calcnlate the Kendall’s tan matrix T based on {Ri,..., Rt}- 

Remark 2.1. In (2.10), to calcnlate D^, we employ the term o'l/a^i to approximate the 
scaling factor between the median absolnte deviation and the Pearson’s standard deviation. 
This facilitates theoretical derivations. In practice, we can nse, for example, the average 
version X^j^i / J2j=i ^mj ^o estimate the scaling factor. 

For estimating w"''Sw, we replace Dest by D^, by sin( 7 rT/ 2 ), and Sest by in 
(2.6). For any given 1 — 7 G (0,1), we calcnlate the Robnst H-CLUB estimator U^{'y) as 


£/'‘(7) = 1’-‘(l-7/2)\/!fj7r. (2.11) 

where is calcnlated by employing the circnlar block bootstrap method introdnced earlier. 
The corresponding conhdence interval for the risk is 

[w^S V - [/^(y), . (2.12) 


2.3 Unknown Marginal Volatilities 

This section considers the setting that D is nnknown with no additional data available. 
More precisely, we nse a data splitting strategy for separately estimating D and More 
precisely, we estimate D nsing the whole dataset: 

D = (Dll, • • •, Ddd), with Bjj := 

O'M,! 
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( 2 . 13 ) 





where auj = ^M{{Rtj}f=i) for j = and ai = (Var({i?ti}^;^)) is the Pear¬ 
son sample standard deviation of For estimating S°, we extract a snbsequence 

Rx-Ts+iy • • •) Rt from the time series -Ri,..., Rt, where Tg x with S a small enough 
absolute constant. Using this subsequence, we calculate the Kendall’s matrix T^. Combining 
it with D, we obtain a robust covariance matrix estimator 

S* := Dsin D. 

We then estimate w"''Sw via replacing Dest, and Sest by D, sin(|T®), and S* in (2.6). 
We then obtain a Robust H-CLUB estimator as 

U^(7) = $-'(1 - 7/2)7^, (2.14) 

where is calculated by employing the circular block bootstrap method. Accordingly, we 
construct the conhdence interval of the risk as 

[w^S"w - f/"(7), w'^S'w + U"(7)]. (2.15) 

Remark 2.2. In (2.13), for estimating the scaling factor, we can employ a similar average 
version as in Remark 2.1. We also note that the data splitting strategy is mainly proposed for 
theoretical analysis. In practice, we can set 5 = 0 and use the entire data set in calculating 
S® and performing the block bootstrap. 


3 Asymptotic Theory 

In this section we prove that the conhdence intervals of w'''Sw corresponding to three settings 
discussed in Section 2 have desired coverage probability. In other words, we prove that the 
Robust H-CLUB estimators proposed in (2.7), (2.11), and (2.14) are asymptotic (1 — 7)100% 
conhdence upper bound for the risk. It is clear that this problem reduces to calculating the 
limiting distributions of w'''(Sest ~ S)w for Sgst = and S®. In the sequel, we adopt 

the triangular array setting as in Fan and Peng (2004) and Greenshtein and Ritov (2004) 
and allow the dimension d to increase with the sample size n. 

We introduce several mixing conditions for measuring degree of dependence. We start 
with an introduction of three mixing coefficients. For a d-dimensional stationary process 
{Rthe^, let be the cr-algebra generated by Ra,..., Rb ior a < b. We dehne the a-, fd-, 
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and 0-mixing coefficients as follows: 

a{n) ■= snp |P(74 fl i?) — P(74)P(i?) |, 

0(n) :=e| snp |P(A| - P(A) 1 

(l){n) := snp |P(y4|i?) — P(y4) |. 

seJ-o^,AeJ'-,P(s)>o 

For an arbitrary positive integer n, we have a{n) < /3{n) < 0(n) (Yoshihara, 1976). 

Snppose that {Ri,... ,Rt} is a snbsequence of the stationary process Let F 

be the distribntion fnnction of Ri. For a := Dw = (oi,..., 0 ^)"'', let X —>■ M be a 

kernel fnnction 

g{Rt, Rf) ^ cos(^Tjk)sign{Rtj - Rt>j)sign{Rtk - Rt'k)- (3.1) 

j¥=k 

We fnrther dehne the following 3 qnantities which will be nsefnl in the later sections: 

fi'i(-Ri) := j g{Ri,R2)dF[R2), (3-2) 

e := j R 2 )dF{R,)dF{R 2 ) = a^|cos(^T) o ^xja, (3.3) 

00 

cr^ := 4(E^l(i^l)2 - 9^ + 2j2{^9i{Ri)9i{Ri+h)})- (3.4) 

h=l 

In the following, we assnme that the elliptical time series model in Section 2 holds. 

3.1 Theory for Known Volatilities 

We make the following fonr assnmptions which regulate the portfolio allocation vector w 
and the stationary process {Rt}t£z- 

(Al) There exist absolute constants Ci and C 2 such that ||w||i<Ci and ||S||max<C' 2 - 
(A2) a is lower bounded by a positive absolute constant. 

(A3) The process tgz is 0-mixing with 0(n) for some e > 0. 

(A4) logd/(TL2) = 0 ( 1 ). 

Assumption (Al) regulates the portfolio allocation vector w to prevent extreme positions. 
It is a common assumption made for stability of the portfolio (Jagannathan and Ma, 2003; 
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Fan et al., 2012, 2015). Assumption (A2) guarantees that the portfolio risk can not be 
diversihed away. This is mild given that the returns are commonly assumed to follow a 
factor model (Chamberlain, 1983; Fan et ah, 2015). Assumption (A3) is routinely used in 
analyzing time series to capture the serial dependence strength (Pan and Yao, 2008; Han 
and Liu, 2013b). Lastly, Assumption (A4) allows d to grow nearly exponentially faster than 
T and hence is mild. 

In the setting of Section 2.1 and Assumptions (A1)-(A4), we derive the limiting distribu¬ 
tion of w'''(S —S)w. The following theorem shows that a/Tw'''(S —S)w/(T is asymptotically 
normal. 

Theorem 3.1 (CLT, known volatilities). Assuming that (AO) - (A4) hold and in the setting 
of Section 2.1, we have 

Vfw^i^ - S)w/a 4 A(0,1), 


as both T and d go to inhnity. 

The following theorem verihes that calculated using the circular block bootstrap ap¬ 
proach is a consistent estimator of This result, combined with Theorem 3.1 and Slutsky’s 
theorem, conhrms that ■\/Tw'''(S — S)w/a converges weakly to the standard Gaussian. Ac¬ 
cordingly, the conhdence interval in ( 2 . 8 ) gives a reliable coverage probaility. 

Theorem 3.2 (bootstrap, known volatilities). Under Assumptions (AO) - (A4), we have 

0=2 = 0 - 2^1 -h Op(l)), 

and accordingly, for any given 7 G (0,1), as T, d —)■ 00 , we have 

P^w'''l]w G [w'''Sw — [/(y), w'''Sw -|- 1/(7)]^ —)■ 1 — 7. 

The above two theorems only assume that the marginal second moments exist. Therefore, 
the Robust H-CLUB estimator naturally handles heavy-tailed data. 

3.2 Theory with Additional Data 

In this section we study the setting in Section 2.2. When D is unknown, we require additional 
assumptions. First, the following three assumptions require that d does not grow too fast 
compared to n and the given time series {Xt}te:Z (either {Rt}t£z or {Ht}t£z) is (/)-mixing 
with an exponentially decaying serial dependence. 

. (A5). maxjA/logd/T'^, logd/(T^/^)} = o(l). 
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• (A6). The process tgz is 0-mixing with 0(n) < Ci exp(—C2n^) for some absolute 
constants Ci, C2, r > 0. 

. (A7). Letting a = max(l, 1/r), we require that logd = 

Recall that 6 is dehned in (2.9) for characterizing the length of historical data. Secondly, we 
require that the returns’ (4 4- ei)-th moments exist for some absolute constant ei > 0, and 
the density functions are bounded away from zero around the median: 

• (A8). For any j G {1,..., d}, < Co < 00 for some constant ei, Co > 0. 

. (A9). Let fj and fj be the density functions of Xj and \Xj — Q{Xj] 1/2) |. For any 
j G {1,..., d}, we require inf|j;_Q(/;i/2)|<K f{x) > rj for some positive absolute constants 
K and ?7, and any / G {fjjj}. 

Under (AO) - (A2) and (AS) - {A9), the next theorem shows that a/Tw'''(S^ — S)w is 
asymptotically normal. 

Theorem 3.3 (CLT, unknown volatilities with additional data). Assume that Assumptions 
(AO) - (A2) hold. In addition, assume that Assumptions (A5) - (A7) hold for both {Rt}t&'L 
and the additional data {iTijigz, and Assumptions (A8) - (A9) hold for Then in 

the setting of Section 2.2, we have 

yTw'^(S'^ - S)w/(T 4 A(0,1), 

as both T and d go to infinity. 

The next theorem shows that is a consistent estimator of and accordingly the 
confidence interval in (2.12) is valid. 

Theorem 3.4 (bootstrap, unknown volatilities with additional data). Under the assump¬ 
tions of Theorem 3.3, we have 


al = a\l + op{l)}, 

and accordingly, for any given 7 G ( 0 , 1 ), as T, d —)■ 00, we have 

P(^w'''Sw G [w'''S^w — 17^(7), w'''S^w -|- 17^(7)] j — ?■ 1 — 7. 
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3.3 Theory with Unknown Marginal Volatilities 

Lastly we study the setting in Section 2.3. Under this setting, we use a data splitting strategy 
and make inference only on a subsequence of length . The next theorem justihes the 
use of such an approach. 

Theorem 3.5 (CLT, unknown marginal volatilities). Assume that Assumptions (AO) - 
(A2) hold and Assumptions (A5) - (A9) hold for {Rt}t&z. Then, under the setting of 
Section 2.3, we have 

- S)w/(T 4 A(0,1). 

Furthermore, the bootstrap-based estimator proves to be a consistent estimator of cr^. 

Theorem 3.6 (bootstrap, unknown marginal volatilities). Under the assumptions of Theo¬ 
rem 3.5, we have 

al = a^{l + op{l)}, 

and accordingly, for any given 7 G (0,1), as T, d —>■ cx), we have 

P(^w^Sw G - f/"( 7 ), w^S"w + U"( 7 )]) ^ 1 - 7 . 

Remark 3.7. Compared to the method in Fan et ah (2015), the Robust H-CLUB estimator 
gains substantial robustness since it only assumes that the (4 -|- ei)-th moments exist for the 
marginal returns. In comparison. Fan et ah (2015) require a strong exponentially decaying 
rate in the tails (Check, for example. Assumption 3.4 therein). Such assumptions are often 
too restrictive and rarely satished in real applications. The Robust H-CLUB estimator 
attains the power for handling heavy-tailed data at the cost of a small efficiency. This is 
due to the data splitting strategy, which is an artifact of the proof. In practice, we hnd that 
the method introduced in Section 2.3 performs well. 

The data splitting strategy allows the portfolio allocation vector to be random. More 
specihcally, suppose that w is calculated based on the data Ri,..., Rt- The next theorem 
shows that a/Uw'''(S® — S)w is asymptotically normal under assumptions outlined below. 

Corollary 3.1. Under the assumptions in Theorem 3.5, let w = [wi,... ,Wd)'^ be an esti¬ 
mator of w = (tci,..., WdY satisfying that 

F(\wj/wj — 1| > t) < 2exp(—(3.5) 

for some absolute constant C, any j G {1,..., d}, and any t > 0. We then have, as T, d —)■ 00 , 

- S)w/a 4 A(0,1). 

In this case, we can also employ a similar circular block bootstrap procedure for estimating 
the asymptotic variance of -y/Uw'''(S® — S)w. 
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4 Simulations on Synthetic Data 


In this section we examine the hnite-sample performance of the Robust H-CLUB estimators 
on synthetically generated data with heavy tails and noise contamination. We calculate sev¬ 
eral statistics of the estimators, following those used in Fan et ah (2015), to show the quality 
of the estimators. Our analysis shows that the Robust H-CLUB estimator performs well in all 
of the cases considered when compared to the full-conhdence bound = ||w ||2||Sest-S|Uax. 
We observe that 95% conhdence intervals by our proposed method are much tighter than 
the bound given by ^t- We also demonstrate that the H-CLUB calculated based on the 
robust estimators outperforms the H-CLUB based on the sample covariance matrix esti¬ 
mator S proposed in Fan et al. (2012) in the presence of heavy-tailed data. In particular, 
we show that the H-CLUB estimator does not achieve coverage proportions of 95% in the 
heavy-tailed setting, while the performance of the Robust H-CLUB estimator is consistently 
reliable. Lastly, we show that the Robust H-CLUB estimators also perform competitively 
when applied to the Gaussian data. 

4.1 Calibration and Parameter Selection 

To calibrate the parameters governing data generation in our model, we use the daily re¬ 
turns of the S&P 500’s top 100 stocks ranked by market capitalization (as of June 29th, 
2012), and the 3-month Treasury bill rates, sourced from the COMPUSTAT database 
(www.compustat.com) and the CSRP database (www.crsp.com), respectively. We consider 
the excess returns {y*} over the period from July 1, 2008 to June 29, 2012. We extract the 
following features: 

1. with dl equal to the sample standard deviation of the Uth stock. 

2. fhe sample correlation matrix of the observations jt- 

From these, we extract the mean and variance of denoted respectively by /idt and 

(Tji. We also compute the average and standard deviation of all pairwise correlations, denoted 
respectively by /xsot and cr^ot- These parameters are used to generate correlation matrices 
and marginal variances later on. 

We also have several tuning parameters to select. We choose with 6h = 

0.1 as the parameter determining the quantity of historical data available to the estimator 
I = with Co = 0.8 as the parameter controlling the block size in the block bootstrap, 

A^bootstrap = 50 as the number of bootstrapped datasets generated, and Tg = with 

6 = 0.01 as the parameter controlling the data-splitting used in the estimator 
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4.2 Simulation 


For each given gross exposure constraint c := ||w||i, we set T = 300 and allow d to range 
from 50 to 500 in multiples of 50. For each value of d we conduct 200 iterations of the same 
procedure: Generate a model, synthesize data from that model, and then calculate estimates 
based on the synthesized data. We collate the outputs across these 200 iterations to allow 
us to compare performance between different estimators. 

The detailed procedure is described as follows: 

1. Generate {di}f^i independently from the Gamma distribution with mean and vari¬ 
ance cTj|. Dehne D as the diagonal matrix such that Yin = di. 

2. Generate entries of independently from the Gaussian distribution with 

mean and variance cr'^ot- We threshold these off-diagonal elements to be no 
greater than 0.95 and set the diagonals of to be 1. If the matrix is not positive 
dehnite, we use Higham’s algorithm (see, e.g. Higham (2002)) to make it so, while 
keeping the diagonals hxed at 1. 

3. Dehne the covariance matrix S = DS'^D. 

4. Generate independently from the multivariate t distribution with 5 degrees of 

freedom and covariance matrix S. Generate independent historical data from 

the multivariate t distribution with 5 degrees of freedom and covariance matrix D^. 

5. Add noise contamination to the data by selecting a random 1% of the elements in 

and multiplying each one by a random variable drawn independently from a 
Unif(l, 15) distribution. Do the same to 1% of the elements in This step can 

be regarded as the news arrivals on the hrms that cause their returns to jump. 

6. Galculate the covariance estimates given by the sample covariance matrix S and the 
robust estimators S, and S^, using the tuning parameters given in Section 

7. Generate 500 portfolio allocation vectors w according to the method outlined in Fan 
et al. (2015), which is approximately uniformly distributed on the manifold {w : 
||w||i = c, w^l = 1}. 

^We find the following minor alteration to improve performance in practice: For the H-CLUB based on 
we take block-bootstrapped samples of both {HJ and {RlLi in estimating the variance of w^S^w. 
For this we use the block size parameter Ih = entirely analogously to the block bootstrapping 

performed on with I = [T^ . We use this modihcation throughout Sections 4 and 5. 


15 



8. For each portfolio allocation, calculate the H-CLUB estimates corresponding to the 
estimators listed in Step 6. As proof-of-concept, we also calculate the estimator with 
'^Ts=ti which is the estimator with Tg =T (i.e., no data-splitting performed). 

9. Over the 500 portfolios, compute the averages of the true risk R{w) := a/w'''Sw, as 
well as A ;= |w'^(Sest - S)w|,^t := ||w||^||Sest - S||max, and ^(0.05) = 2^/WJT for 
each of the estimators Sest considered. 


We plot the averages of A, and [/(0.05) against d for every estimator considered and for 
c = 1, c = 1.6, and c = 2 to observe the effects of gross exposure on risk assessment. 

Next, for d = 200 and d = 500, we calculate the following quantities over the 100,000 
portfolios (500 portfolios over 200 synthetic datasets) : The coverage proportion, dehned 
as the fraction of the sample in which the 95% conhdence interval contains the true risk 
R{w) = (w'''Sw)^/^, the ratio of bounds dehned as 


and the relative error dehned as 


REi := 


REo 




is/syr’ 

•JWf 


2w'^Sw 

Again, we compute these for c = 1, 1.6, and 2. The measure REi compares the upper bound 
with the half width of the 95% conhdence interval, whereas RE 2 is the half width of 95% 
conhdence interval for the portfolio risk {w^Swj^A divided by the portfolio risk itself. The 
former depicts how inefficiency the conhdence upper bound is and the latter measures how 
informative the constructed conhdence interval is. 

Lastly, we repeat the previous calculations of coverage proportions, REi and RE 2 in 
a setting where the data are generated from a Gaussian distribution without any noise 
contamination. This means we alter Step 4 of the procedure above (but substitute Gaussian 
distribution for t distribution) and remove Step 5. This allows us to examine the degree 
of efficiency loss for robustness when data are normal. In this setting, we also calculate 
the ratio [/(0.05)/A as a measure of how tight the H-GLUB is relative to the theoretical 
minimum bound. 


4.3 Results 

In Figures 1 and 2, we plot the average risk estimation errors along with the estimated error 
bounds with gross exposure c = 1, 1.6, and 2, using estimators Sest = S, S^, S'^, and 
Note that c = 1.6 results in an average 130% long positions and 30% short positions, which 
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Sample covariance estimator 



Figure 1: Averages of A = |w''' (Sest — S)w| (blue curve), [/(0.05) = 2 \J Var(w"'’Sestw) 
(dashed curve), and = ||w||f||Segt — S||max (red curve) for c = 1.0. Horizontal axis shows 
dimension of problem, i.e., portfolio size. Vertical axis shows the calculated averaged values. 

is commonly used in practice. We also use the sample covariance matrix estimator S, for 
which an H-CLUB estimator was derived in Fan et ah (2015), which is not robustihed. 

From these plots, we see that 

• The dashed curve lies above the solid blue line throughout, an indication of the validity 
of the 95% bound given by f/(0.05). It is interesting to note that this still holds for 
the sample covariance matrix estimator S, but this is in the average sense. As we will 
see in Table 1, however, S fails to attain 95% coverage. 

• The crude bound is much larger than either the true error A or the 95% conhdence 
bound t/(0.05). This discrepancy increases with d, but also with c as we can see by 
comparing Figure 1 with Figure 2. This is quantihed in Table 2. 

• For large d the crude bound on the sample covariance matrix estimator is almost 100 
times larger than on any of the robust estimators. This suggests inaccurate estimation 
of the sample covariance in the presence of heavy tails and contamination. 

Table 1 illustrates the coverage of each estimator, dehned as the proportion of samples 
in which the 95% conhdence interval captures the true variance w'''Sw. It can be seen that 
all the robust estimators have coverage proportions of approximately 95%. However, the 
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Robust estimator (known marginal variance) 
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Robust estimator (no data-splitting) 
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O - I - ""i... * 1 ... . .^ .. 
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d 


Robust estimator (known history) 



So So So So So 

d 


(a) c = 1.6 


(b) c = 2 


Figure 2: Averages of A = |w''' (Sest — S)w| (blue curve), f/(0.05) = 2 \J Var(w"'’Sestw) 
(dashed curve) and = ||w||i||Sest — S||max (red curve) for c = 1.6 and c = 2. Horizontal 
axis shows dimension of problem, i.e., portfolio size. Vertical axis shows the calculated 
averaged values. 


sample covariance matrix estimator S has substantially lower coverage. It is not sufficiently 
robust to give a valid bound under the current setting. 

We make further comparisons between the robust estimators we have proposed. Table 2 
illustrates averages and standard deviations of the ratio REi = .^^/^(O.OS): the ratio between 
the full conhdence bound and the H-CLUB. These serve to quantify some of our observations 
made on Figures 1 and 2 — in particular, that the ratio ^^/^(O.OS) increases strongly with 
c and weakly with d. 

We observe that: 

• The value of REi is considerably bigger than 1, reflecting the fact that the conhdence 
interval given by the Robust H-CLUB is much tighter than that given by the crude 
bound. In almost all cases the value of REi rehects a difference of scale of an order of 
magnitude between the H-CLUB interval and the crude interval using 
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Table 1: Empirical coverage proportion for 95% confidence intervals in settings of data drawn 
from ts distribntion with 1% noise contamination. Taken over 200 samples with T = 300. 



d = 200 

d = 500 


c= 1.0 

c = 1.6 

c = 2.0 

c= 1.0 

c = 1.6 

c = 2.0 

Coverage 

S 

81.88% 

72.29% 

69.31% 

83.30% 

82.24% 

80.12% 

Coverage 

97.59% 

95.26% 

97.64% 

99.00% 

97.09% 

95.52% 

Coverage 

^Ts=T 

96.38% 

95.70% 

97.49% 

98.18% 

97.03% 

95.03% 

Coverage 

93.87% 

93.19% 

95.23% 

93.01% 

92.84% 

94.67% 

Coverage 

E 

94.21% 

95.54% 

96.40% 

95.16% 

93.41% 

93.67% 


• The ratio REi increases with onr ability to accnrately estimate the marginal stan¬ 
dard deviations. Note that REi(S) > REi(S^) > REi(Sf.^^ 2 ’) > REi(S^), which 
corresponds to an ordering based on the amonnt of information used to estimate the 
marginal standard deviations. 

• The value of REi increases strongly with c and weakly with d. This suggests that 
the accuracy benehts of using the H-CLUB over the crude bound are particularly 
substantial for larger portfolios and those with higher gross exposure. 

Table 3 summarizes the relative error (RE 2 ), which shows how informative our conhdence 
intervals for the true portfolio risks are. Similar to Table 2, we show the mean and standard 
deviation of RE 2 calculated over 200 simulations with 500 randomly generated portfolios per 
simulation (i.e. 100,000 portfolios total). 

Here we see a similar pattern as before. Values are generally better (smaller, here) when 
more information is available in our estimation of the marginal standard deviations. This 
statement comes from the observation that RE 2 (S) RE 2 (S^) < RE 2 (Sfi^= 2 ^) < RE 2 (S^). 
We also observe that here the value of RE 2 does not appear to vary much with either c or 
d. It is also substantially larger than the values seen in, e.g.. Fan et ah (2015), presumably 
due to the heavier tails and presence of noise in the data here which is not seen in those 
settings. This difference can be immediately observed by comparing with Table 4. From the 
last row of Table 3, the uninformative construction of the conhdence interval is mainly due 
to the inaccurate estimation of the marginal variances in presence of large random noises 
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Table 2: Averages and standard deviations (in parentheses) of REi := over 

200 samples. 



d = 200 

0 

0 


c= 1.0 

c = 1.6 

c = 2.0 

c= 1.0 

c = 1.6 

c = 2.0 

REi 

5.57 

14.73 

21.62 

6.63 

17.55 

27.50 


(1.94) 

(5.51) 

(7.68) 

(2.18) 

(6.13) 

(9.95) 

REi 

5.64 

14.54 

21.90 

6.70 

17.47 

27.57 

^Ts=T 

(1.85) 

(5.64) 

(8.50) 

(2.32) 

(6.61) 

(9.39) 

REi 

5.87 

14.65 

22.44 

6.93 

18.54 

27.22 


(2.11) 

(5.24) 

(8.55) 

(2.25) 

(6.56) 

(9.55) 

REi 

9.88 

25.43 

38.85 

12.29 

32.19 

48.62 

S 

(2.80) 

(7.31) 

(10.89) 

(3.13) 

(9.10) 

(12.91) 


Table 3: Averages and standard deviations (in parentheses) of RE 2 = a/?V^/2w^Sw over 
200 samples. 



d = 200 

d = 500 


c= 1.0 

c = 1.6 

c = 2.0 

c= 1.0 

c = 1.6 

c = 2.0 

RE 2 

0.513 

0.627 

0.478 

0.521 

0.549 

0.480 

S" 

(0.609) 

(0.880) 

(0.534) 

(0.586) 

(0.606) 

(0.540) 

RE 2 

0.500 

0.644 

0.483 

0.517 

0.559 

0.471 

^Ts=T 

(0.594) 

(0.906) 

(0.554) 

(0.595) 

(0.626) 

(0.531) 

RE 2 

0.462 

0.571 

0.575 

0.492 

0.471 

0.494 


(0.485) 

(0.837) 

(0.691) 

(0.604) 

(0.555) 

(0.573) 

RE 2 

0.022 

0.021 

0.021 

0.021 

0.021 

0.021 

S 

(0.002) 

(0.002) 

(0.002) 

(0.002) 

(0.002) 

(0.002) 
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and heavy tails. 

For our last set of results on synthetic data, we show in Table 4 that the robust estimators 
are still competitive with the sample covariance based estimator when the data are drawn 
from a Gaussian distribution without noise contamination. In this table we present coverage 
proportions, means of REi and RE 2 , as well as the mean of the ratio between the 95% H- 
CLUB and the value it is upper bounding, with this ratio given by 17(0.05)/A. These are 
calculated over 200 randomly generated models. 


Table 4: Coverage proportion and means of REi, RE 2 and [/(0.05)/A for 200 samples when 
returns are drawn from Gaussian distributions without noise contamination, using d = 500. 



Coverage 

REi 

RE 2 

f/(0.05)/A 

c 

1.0 

1.6 

2.0 

1.0 

1.6 

2.0 

1.0 

1.6 

2.0 

1.0 

1.6 

2.0 

S 

.948 

.944 

927 

8.10 

21.22 

33.17 

4.01% 

3.97% 

4.01% 

5.67 

6.29 

7.02 

S" 

.965 

.954 

950 

8.57 

22.24 

34.19 

7.19% 

7.13% 

7.14% 

7.88 

5.86 

7.09 

^Ts=T 

.960 

.951 

950 

8.58 

22.46 

33.98 

7.17% 

7.06% 

7.20% 

7.42 

5.93 

8.88 


.960 

.953 

964 

9.26 

23.99 

37.28 

4.92% 

5.01% 

5.09% 

6.97 

7.14 

5.47 

s 

.957 

.949 

923 

11.65 

30.65 

48.75 

2.01% 

2.00% 

2.00% 

7.05 

6.76 

5.82 


5 An Empirical Study 

In this section we examine the behaviour of the Robust H-CLUB estimators when applied 
to real-world data. We use the daily excess returns of 100 industrial portfolios formed 
on size and book-to-market ratio, as available on the website of Kenneth French. We use 
the subset of data spanning from July 1, 2008 to June 29, 2012. For each 21 day period 
(nominal month), we use the preceding 21 days’ data to estimate the covariance matrix 
via the Robust H-CLUB estimator with data-splitting (S®), the Robust H-CLUB with no 
data splitting the Robust H-CLUB estimator with known history (S^). For 

the matrix of additional observations used in the latter estimator, we use the preceding 1.5 
months (31 days) of returns data. Note that for all robust estimators in this section we 
use the tuning parameter I = (i.e. eg = 0.5) for the block size in the bootstrapping 

procedure. All other parameters are as in the previous section. Finally, we also estimate the 
covariance via the sample covariance matrix estimator S for comparison. 

We track the performance of the H-CLUB estimators on three portfolios: one portfolio 
with equal weighting (w = (1/100,..., 1/100)), and two portfolios of minimum variance 
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with gross exposure c = 1 and c = 1.6, as given by 

w = argmin w^SgstW. 

w'’'l=l,||w||i=c 

Note that on occasion the estimated covariance matrix is not positive dehnite, leading to 
problems in solving for the portfolio of minimum variance. In these cases, we coerce the esti¬ 
mated covariance matrix to be positive definite using Higham’s algorithm before calculating 
the minimum variance portfolio. 

The portfolios of minimum variance are calculated at the start of each nominal month. 
The actual risk during the holding month for each w as defined above is then 

1 

R{w) = and S = , 

^ t=i 

where {yt}f=i are the centralized daily returns over the holding month. This is calculated 
for each month in the four year period of study. 

For each estimator and portfolio strategy, we consider five quantities. These quantities 
are summarized via their mean (calculated over the whole study period) in Table 5. We 
compare the hrst two columns of Table 5 and provide several observations. 

• The values of A are comparable among the four estimators considered. This suggests 
that all estimators are similar in their estimations of the covariance matrix S, and 
that differences between them lie in their ability to accurately conduct inference on 
Sgst (i.e. construct a valid H-CLUB). 

• The (non-robustified) sample covariance matrix estimator S fails to give a valid upper 
bound, as [/(0.05) is less than A throughout. 

• For the robust estimators, 17(0.05) is greater than A for all cases except one. This 
is broadly consistent with the expectation that the value of f/(0.05) for the robust 
estimators is a 95% upper bound of the estimation error for portfolio variance. We 
note that for the single discrepancy (S^, on the minimum variance portfolio with 
||w||i = 1.6), the value of f/(0.05) still only falls below A by a small margin. 

Lastly, the estimated risk error U (0.05)/ ^ 4w'''SestW is an H-CLUB estimate for the true 
risk error |(w'''Sw)^/^ — (w'''Sestw)^/^| (we can see this simply by applying the delta method 
to the results of, e.g. Theorem 3.6). The last two columns of Table 5 show that the robust 
estimators hold true to this, with the estimated risk error uniformly bounding the true risk 
error in all cases. However, the non-robustified sample covariance estimator does not yield a 
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Table 5: Annualized true and estimated risk errors calculated on the 100 Fama-French 
portfolios 



Average of 

Average of 

Average of 

True Risk 

Estimated 

Strategy 

A(xl0-^) 

f/(0.05)(xl0-^) 

True Risk 

Error 

Risk Error 

S (Sample Covariance Matrix Estimator) 

Equal weighted 

2.310 

1.939 

27.36% 

8.32% 

6.62% 

Min. variance (c = 1) 

1.289 

0.743 

19.52% 

6.97% 

4.19% 

Min. variance (c = 1.6) 

0.760 

0.312 

15.25% 

6.38% 

2.66% 


S" 

(Robust Estimator) 



Equal weighted 

2.165 

4.790 

27.36% 

8.35% 

18.67% 

Min. variance (c = 1) 

1.470 

2.696 

21.06% 

8.41% 

17.67% 

Min. variance (c = 1.6) 

1.576 

2.249 

18.30% 

13.05% 

46.32% 

E: 

fs=T (Robust Estimator — no 

data-splitting) 



Equal weighted 

2.154 

5.121 

27.36% 

8.32% 

18.94% 

Min. variance (c = 1) 

1.459 

2.826 

21.02% 

8.34% 

20.41% 

Min. variance (c = 1.6) 

1.562 

2.218 

18.22% 

12.86% 

37.81% 

(Robust Estimator — known history) 

Equal weighted 

2.100 

3.325 

27.36% 

7.69% 

12.85% 

Min. variance (c = 1) 

1.390 

1.885 

20.79% 

7.63% 

12.25% 

Min. variance (c = 1.6) 

1.358 

1.200 

17.52% 

10.99% 

17.40% 

Note: A = w'''(Eest — ^ 

l)w , f/(0.05) 

= 2 X (Var(w'''Eestw))^A, True Risk is 

V252 X 

R{w). True Risk Error is 

X 

H 

Sestw)^/^ - 

Ew)^/^ , and Estimated Risk Error 

is a/252 X C(0.05)/^4w"'’EestW. The factor of a/252 

is present to 

convert the 

risks to 


annualized values. 

good upper bound, with the estimated risk error uniformly falling below the true risk error. 
This is again an evidence for the strength of the proposed robust estimators in the presence 
of heavy-tailed or noisy data. 

6 Conclusion and Discussion 

This paper considers the problem of assessing the risks of large portfolios in a robust manner. 
We consider three different settings depending on whether D is known or not, and propose 
three corresponding Robust H-CLUB approaches based on robust rank and quantile statis¬ 
tics. For the hrst time in the literature, we provide an inferential theory of these robust risk 
estimators. Compared to Fan et ah (2015), the proposed approaches do not require strong 
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moment assumptions on the data. Both theoretical and empirical results verify that the 
Robust H-CLUB approaches are more appropriate for studying heavy-tailed asset returns. 

In the present paper, we do not impose any structural assumption on the covariance 
matrix, such as the low rank plus sparse structure induced by the factor model. Fan et al. 
(2015) propose methods based on factor-based covariance matrix estimators proposed in Fan 
et al. (2008) and Fan et al. (2013). A natural extension to Fan et al. (2013) is to use S 
(or S^), instead of the sample covariance S, as the pilot estimator and plug it into the 
POET algorithm (Fan et ah, 2013). This constructs another robust risk estimator. We plan 
to investigate the theoretical properties of such robust risk estimators and their limiting 
distributions in the future. 

The results in this paper also raise a number of interesting questions for future research. 
One example is on deriving the limiting distributions of functionals of S other than w''^Sw. 
For example, Han and Liu (2014a) study the limiting distribution of ||S||max as T, d —>■ cx) in 
the setting that the observations are mutually independent. It is interesting to investigate 
such asymptotic theory for a multivariate time series. 


7 Proofs 

In this section we provide the proofs of results in Section 3. In the sequel, using Assumption 
(Al), we assume that ||w|| = 1 and ||S||max < 1 without loss of generality. 


7.1 Supporting Lemmas 


Lemma 7.1 (Kontorovich et al. (2008) and Mohri and Rostamizadeh (2010)). Let / : 12^ —)■ 
M be a measurable function that is c-Lipschitz with regard to the Hamming metric for some 
c > 0: 


sup 


/(xi, . . . , Xt, . . . , Xt) - /(xi,. . . , x(, . . . , Xt) 


< c. 


and Xi,, Xt be a sequence of stationary (/)-mixing random variables. Then, for any e > 0, 
the following inequality holds: 


p{|/(Xi,...,Xr)-E/(Xi,...,Xr)| >6} 


< 2 exp 


r_ 2e^ 1 


Lemma 7.2 (Yoshihara (1976)). Let {Xt}t(£z be a stationary process with the distribution 
function F. For T > m, we dehne 

*C.im 
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be a t/-statistic with order m and kernel fnnction g. Let the fnnction gi{-) be defined as 


= J g{Xu... ,XjdF{Xi+,).. .dF{X^), 

ioY 1 < i < m, and let parameters 6 and be defined as 

e = J g{X ^,..., XjdF{X ,)... dF{Xj, 

oo 

= 4(Eg,{X,y -9^ + 2j2{^9i{X^)g,{X^+h) - 0 ')). 

h=l 


(7.1) 


Snppose there exists a constant 5 > 0 snch that for r = 2 + 5, the following conditions hold: 

1. J \g{Xi ,..., Xm)\^dF{Xi)... dF{Xm) < Mq < oo for some constant Mq; 

2. E| 5 f(Xi,..., Xm)\'" < Ml for some constant Mi; 

3. {X,} t£z is /3-mixing with /3(n) = 0{n (2+5')/<5'| fQj, some 0 < 5' < 5. 

Assnming that the above conditions hold, we then have 

VriUrig) -»} d , 

- —y Zj ^ 8dS 1 —y oc, 

(J 

where Z ~ X(0,1) is a standard Ganssian random variable. 


Lemma 7.3 (Yoshihara (1976)). Let {Xt}t& be a d-dimensional stationary process with the 
marginal distribntion fnnction F, and Xi,..., Xt be a seqnence of observations. Snppose 
h(-) : X —)■ M is a kernel fnnction snch that for some constants C > 0 cind iL > 0, we 
have 

j j\h{Xi,X2)\^+^dF{Xi)dF{X2)<H, (7.2) 

J\h{Xi,Xi+k)\^Md^Xi,Xi+k) < H, for all k>0, k eZ, (7.3) 

where P(Xij,Xi 2 ) is the joint distribntion fnnction of (X^^jX^^). For arbitrary random 
vectors we define 


hi{X) = j h{X,Y)dF{Y) - 
h 2 {X,Y) = h{X,y)-hi{X) 


j J h{X,Y)dF{X)dF{Y), 

-hi(Y)- J j h{X, Y)dF{X)dF{Y). 
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If the process is /3-mixing with mixing coefficient [i{n) = 0{n j for a con¬ 

stant G (0,C), then, for the [/-statistic 

Urih^) := YlfTTY) E h,[X,,,X,,), 

^ ' tl<t2 


we have 


E{T[/r(h2)'} < 


T(T- 1)2 


E{h2(X,„X,Jh2(X,3,X,J} 




E lE{h2(Xi3,XiJh2(X*3,X,J}| =0(T-"), 




where A := min (2(C - C')/{C'(2 + C)}, l)- 


Lemma 7.4. Let be a d-dimensional stationary process with the marginal distri¬ 

bution function F, Xi,..., be a sequence of observations, and Xj^,..., X^ be a block 
bootstrapped sample with block length I x dehned in Section 2.1. For a kernel function 
h : X —)■ M, dehne 

UtW = 53 and 53 fc(X-,X,*) 

^ ^ tl<i2 ^ ' ti<t2 


to be the [/-statistics based on the observed sample and bootstrap sample, respectively. Now 
supposing that h satishes (7.2) and (7.3), and the process {Xt}tez is ,3-mixing with mixing 
coefficient f3{n) = for a constant G (0,C), we have 


Var* {VfUT{h)}-VaT{VfUT{h)} = op(l), 

where Var* is the variance operator of the resampling distribution P* conditional on Xi,..., Xp. 


Proof. We dehne u : = 
have 



h{X,Y)dF{X)dF(Y). 


Using Hoeffding’s decomposition, we 


t=l 


The fact that for two random variables X and Y, we have Var(X + Y) = Var(X) -|-Var(U) -|- 
2 Cov(X, Y) < Var(X) + Var(y) + 2^Var(X)^Var(U), yields 

2 ^ 

X&Y{VfUf{h)] <Var* ;)| + Var*|yT[/*(h2)| 

V 


A 


Var’ 




(7.4) 
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Similarly, using the fact that Var(X + Y) = Var(X) + Var(y) + 2Cov(X, F) > Var(X) + 
Var(F) — 2A/Var(X)A/Var(F), we have 


VaT*{VfU^{h)} >Var*| — 


-= Y, fti(JC;)} + Var-{v^(7|.(fc2)} 

t=l 


- 2 


\ 


Var’' 




(7.5) 


By Theorem 2.3 of Shao and Yu (1993), regarding hi, we have 


Var=' 


^ t=l ^ ^ t=l 


(7.6) 


On the other hand, by Lemma 7.3, we have Var{A/Tf/'r(h 2 )} = o(l) and Yai*{\/TU^{h 2 )} = 
op(l). Combining them with (7.4) and (7.5), we have 


(7.7) 


Var*{yrC/J.(ft)} = Var*|^ J]ft,(JC;)| +op(l), 

^ ^ t=l 

Similar arguments yield that 

T 

Var{yTt/p(h)} = Varj^ ^hi(Xi)} + o(l). 

t=i 

Combining (7.7) and (7.8), we obtain 


(7.8) 


T T 

V^r-{VTUHh)} - Var{yrC/T(A)}=Var-{-L ^ _ Var{-L ^ h,(X,) 

i=l t=l 


+ Op(l)- 


Combining the above equation with (7.6) completes the proof. □ 

Lemma 7.5. Let {Xt}te_'E be a stationary sequence of 0-mixing random vectors. Suppose 
the 0-mixing coefficient satisfies Assumption (A3). Then we have 

IIET-T|Uax = 0(l/T), 

where T and T are sample and population Kendall’s tan matrix dehned in (2.1). 

Proof. For any two constant l<s<t<T, we have 


~ ^sj > 0 , Xtk — Xtk > 0 ) — P{Xtj > Xsj,Xtk > Xsk)- 
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Let 


—oo = ao < —M < ai < ... < an-i < M < ah = oo 


and 


—cx) = bo < —M < bi < ... < bh-i < M < bh = oo 


be two pre-determined real sequences. Note that for io = I, ■ ■ ■ ,h, given {Xgj G [0*0-1 > 
the event {Xtj > Xgj} implies the event {Xtj > ajQ_i}. This yields 


^{.Xtj 0 ^ Xgj^Xfh 0^ Xgh') ^ ^ ^i,Xtj 0^ diQ—i^Xfh 0^ ^io—1 I ^ [0*0—15 ®*o]5 ^ [^io—1 > ^io])' 

*0 Jo 

lP(^sj ^ IP'iQ — lj Xgh G [6jo — 1 , 6jo]). 


On the other hand, given {Xgj G [0*0-1,0*0]}, the event {Xtj > 0*0} implies the event 
{Xtj > Xgj}. Thus, we have 

^{.Xtj > Xgj^X^h 0 ^ Xgh) ^ ^ lP(^tj ^ 0*0 , > bjQ I G [ojo— 1 , Ojo] , G [^jo—i) ^jo])' 

*0 Jo 

lP(^sj G [o*o-i, 0 * 0 ], £ [^io-i) ^io])- 


Now, we dehne ip]l to be 

1 /’/* • ^ ^ 0 * 0 —^ ^*0—l)^(^'5i ^ [ 0 * 0 — 1 , 0 * 0 ], G [6jo— 1 , &jo]), 

*0 Jo 

and similarly dehne to be 

l/’/* ^ ^ 0*0 , ^tfc ^ ^ [o*o — 1 ) 0 * 0 ], ^sA: ^ [^io —1 > ^io]) ■ 

*0 JO 

Let -iph be either ip^ or with regard to the sign of P(Xtj > Xgj, Xtk > Xgk) — ipk'- 


■ph 


if P(Xtj > Xgj, Xtk > Xgk) > pjp-, 
Otherwise. 


Without loss of generality, supposing that we have P(Xij > Xgj, Xtk > Xgk) > iph, it follows 
that 

|p(Wtj > Xgj, Xtk > Xgk) — iph = ^{Xtj > Xgj, Xtk > Xgk) — iph 

< n ^ —15 ^tk ^ —1 I ^sj ^ ^sk ^ ^Jo]) ^ *^ 70 ’ ^tk ^ ^Jo) |" 

^0 JO 

^ \_^io — li^io\i^sk ^ [^io —1 ’ ^io]) 

<(j){t - s) + max \F{Xtj > aj^-i.Xtk > fejo-i) “ ^i^tj > aj^.Xtk > &jo)l* 

^0 Jo 
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Now let h ^ oo, maxf^ 2 ^ |aj — ai_i| —)• 0, \bi — h-il —^ 0, and M —)■ oo. By the 

definition of (^-mixing coefficient, we have 


> Xgj^Xtk > Xg 




Moreover, letting X' = {X[,... ,X'^)'^ have the same distribntion as Xi and independent of 
{Xg,X^), we have 


dF{Xsj = a, Xgk = h) = dF{X'- = a, X'^ = b). 


This yields 


J nXt, > a ,Xtk > b)dFiXg^ = a, Xg^ = b) = J > a,> b)dF{X' = a,X' = 6). 

Plugging the above equation into (7.9), we obtain 

P(Xi,- > Xgj,Xtk > Xgk) - J nXtj > a,Xtk > b)dF{X' = a,X' = 6)| < 0(t - s). 

Note that by the definition of conditional probability, we have 

J F{Xtj > a, Xtk > b)dF{X' = a, X' = 6) = P(Xt,- - X' > 0, Xt^ - X' > 0). 

Thus, combining the above two equations, we have 

"^{Xtj — Xgj > 0, Xtk — Xgk > 0) — P(Xij — Xj > 0, Xtk ~ X't.> Q) < (l){t — s). (7.10) 

Using similar arguments, we can prove 

"^{Xtj — Xgj < Q,Xtk — Xgk < 0) — P(Xij — X'- < 0,Xtfc — X(, < 0) < (l){t — s), (7.11) 

^{Xtj — Xgj < 0 ,Xtk — Xgk > 0) — P(Xij — Xj < 0 ,Xtk — X'k > 0) < (f)(t — s), (7.12) 

"^{Xtj — Xgj > 0,Xtk — Xgk < 0) — F{Xtj — X'j > 0,Xifc — x'k < 0) < (j){t — s). (7.13) 

By definition, we have Tjk = E{sign(Xtj — X'j){Xtk — X^)}. Applying the definition of 

expectation, we have 


T^k =nxtj - > 0, Xtk - X' > 0) + P(X,, - X' < 0, Xtk - X' < 0)- 

nXt, - X' > 0, Xtk - X' < 0) - F{Xtj - X' < 0, Xtk -X'k>0). (7.14) 
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By the same reason, we have 


E{sign(Xtj - Xsj){Xtk -X,k)} 

=^iXtj — Xgj > 0, Xtk — X'g^ > 0) + P(Xij — Xgj < 0, Xtk — Xgk < 0) — 

— Xgj > 0,Xtk — Xsk < 0 ) — f‘{Xtj — Xsj < 0,Xtk — Xsk > 0 ). ( 7 - 15 ) 


Now, by the dehnition of Tjk, we have 

2 




T(T- 1) 


sign(^ij Xgj^i^X^k 

S<t 

2 I . 

^ ^ — Xsj)(yXtk — Xgk) — Tjk 


S<t 

Plugging (7.14) and (7.15) into the above equation, and applying (7.10) - (7.13), we obtain 


Ef^-fc - Tjk < 5 ^{ 40 (t - s)} 


S<t 


SHUT-mm (1 

T(T-l) \T 


(7.16) 


The last inequality is because by Assumption (A3), we have 

T T 

•\ 1 — t —V 

- < t\ 

tl+' 


7" _ + °° 1 

J2(T- t)m <E7i7r£i’Em7 = 0(0. 


t=i 


t=i 


t=i 


This completes the proof. □ 

Lemma 7.6. Let {Xt}tez be a stationary sequence of 0-mixing random vectors. Suppose 
the 0-mixing coefficient satishes Assumption (A3). Then we have 

lh-T||_ = Op(^). 

where T and T are the sample and population Kendall’s tan matrix based on {Xt}]L^. 

Proof. Consider the following function 

2 

fjk{Xi,X t) := ^ ^ > ^sign(A^j - Xf j)sign.{Xtk - Xt'k) = T ■ Tjk- 

t<t' 


We have 

/,,(Xi,...,X,,...,Xr)-/,fc(Xi,...,X',...,Xr) 

= T ^ - Xtj)sign{Xik - Xtk) - ^sign(W7 - Xtj)sign{Xi>k - Xtk) 

<^{2(r-i)} = 4. 
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Thus, fjk is c-Lipschitz with respect to the Hamming metric. By Lemma 7.1, we have 


F[T\Tjk -Evjkl ^ ^ 2exp 


8T{l + 2Ez=i0(O}- 

for any e > 0. Here 0(0 < C )0 is guaranteed by Assumption (A3). Thus, we have 

d 


P(^||T - ET||max > - lErjfcl > ej < 2 exp 

j,k=l 


2 log d — 


Te^ 


Mi + aEEi-AW} 


Setting e = Y^[24{r+~2^^^~0(I)yiogO/j7T, we have 

||T-ET|Uax = Op(y^). 

Combining the above equation with Lemma 7.5 completes the proof. □ 

Lemma 7.7. [Theorem 1 in Doukhan and Neumann (2007)] Suppose that Xi,... are 
real-valued random variables with mean 0, dehned on a common probability space (H, A, P). 
Let T : —?• N be one of the following functions: 

(а) , ^(m, v ) = 2u, 

(б) . ^(u,u) = u + V, 

(c) . T(m, v) = uv, 

(d) . T(m, v) = a(u -1- u) -|- (1 — a)uv, for some a G (0,1). 

We assume that there exist constants K, M, Li, L 2 > 0, a,b > 0, and a non-increasing 
sequence of real coefficients {p(n)}n>o such that for any u-tuple (si,...,s.u) and u-tuple 
(ti,... ,ty) with 1 < Si < • • • < < ■ ■ ■ < < T, the following inequalities hold: 

< K^M^+^{{u + v)\Y^{u,v)p{t,-Sy), (7.17) 

where for the sequence {p(n)}n>o, we require that 

00 

^(s + l)V(s) < LiL^(A;!)“, Vfc > 0. (7.18) 

s=0 

We also assume that the following moment condition holds: 

E|W|^ < (fc!)^M^ for alH = 1,..., T. (7.19) 


cov 


V 2=1 


.7 = 1 
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Let St = Then, for all x > 0, we have 


F{St >x)< exp 


I 

C^T + C2x(2“+2b+3)/(a+6+2) j ’ 


where Ci and C 2 are constants depending on K, M, Li, L 2 , a, and b: 


Cl = V 2), C 2 = 2{ML2{K^ V 2)}i/(“+^+2). 


(7.20) 


(7.21) 


Lemma 7.8. Let {Xt}t&z be a d dimensional stationary 0-mixing process satisfying As¬ 
sumptions (A6), (A7), and (A9). Let R = diag(aM,i, • • • be a diagonal matrix of 

sample median absolute deviations based on {Xt}]L^, and R = diag{(TM(Aii),..., crM(Airf)} 
be its population counterpart. Then we have 


||R - R||max 



Proof. We hrst focus on a marginal process For notational brevity, we suppress 

the index j and denote the process as {Xt}]Li. Dehne X = Xi. Let F be the distribution 
function of X and Ft be the empirical distribution of {Xt}]Li and Ff^^{q) := Q{{Xt}', q) for 
any q G [0,1]. By the dehnition of Q{-) in (2.3), we have, for any e G [0,1], 

e < Fr{F0^(e)} < e + 

This implies that 


P 


X,}-q)-Q{X-q)> 


u 


— IT yq) 




g + - > Ft{u + F ^(g)} 


By the dehnition of TV, we further have 


T 

p|q({W}; q) - Q{X- g) > n} < pjj^ /{W < F-\q) +u}<Tq + l 

t=i 




- I{Xt < F-\q) +u} + F{F-\q) + u} 


> T 


F{F-\q)+u} 


-q- 



Since {Xt}t£Z is 0-mixing, the process {—I{Xt < F~^{q)+ u} + F{F~^{q)+u}}t£z is also 0- 
mixing. By Lemma 6 in Doukhan and Louhichi (1999), {—I{Xt < F“^(g)-|- m}-|-F{F“^( g)-|- 
u}}t£z satishes (7.17) with K = 2, M = 1, b = 0, any of the four T functions, and 


p{n) = (j){n) < Cl exp(—C' 2 n'’). 
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By Proposition 8 in Doukhan and Neumann (2007), (7.18) is satisfied with a = max(l, 1/r) 
and some constants Li and L 2 . Since —I{Xt < F~^{q) + m} + F{F~^{q) + u} is bounded, 
(7.19) is also satisfied with 6 = 0. Thus, applying Lemma 7.7, we have 

p|Q({Xj;g) -Q{X;q) > < exp(^-^jJ(^F{F-\q) + u} - q - (7.22) 

for F{F~^{q) + u} — q — 1/T > 0, where 

Tx'^ 

(^2T(“+P/(“+2)a:(2«+3)/(a+2) ’ 

for X > 0, a = max(l, 1/r), and some absolute constants Ci and C' 2 - On the other hand, we 
have 


P|Q({Xa;g) - Q(X; q) < -u^ = P|Ff'(g) - F'^q) < < P[g < ^^{^-'(g) - u} 

T 

^ - w}] > T [g - F{F-\q) - m}] ). 


By similar arguments, we have 

p|Q({Xt};g) -Q{X;q) < -m| < exp (^g - F{F-'(g) - u} 
Combining (7.22) and (7.23), we have 


(7.23) 


P 


> u 


Q{{X,]-q)-Q{X-q) 

<exp(^-?/;(^F{F"'(g) +m} - g- + exp(^-?/(^g - F{F"'(g) -«})), 


(7.24) 


for F{F“'(g) + u} — q — 1/T > 0. 

Next, we continue to derive exponential tail probabilities for aM({^i}^i)- We write 
m := Q{{Xt}/Li, 1/2) and m := Q(Wi; 1/2) to be the sample and population medians. Let 
Fi and F 2 be the distribution functions of X and \X — Q{X-, 1/2)|. By the dehnition of o^m, 
we have 


I1’{Sm({^J?’.i) 

- crM(W) > u 

} = p{q(. 

{|.V-a|}0))-Q(|A'-m|;i) 

> u| 

1 

VI 


- m - 

1 

to 1 1= 

V 


1 

VI 


\X-m\;^ 

) ^ 1} +P(l^-"i| > !)• 

(7.25) 
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On the other hand, using the same technique, we have 


<P 

<P 


P<| aM ( ) - auiX) < ^ = P 

T 1 
t=i 2 


\Xt - m\ 


|q(^||X 4 - m\ 


\m — ml 


\Xt-m\ 


;=i’ 2 


t=i’ 2 


Q[\X-m\;-] < -u 


u 


Q{\X < -u 


Q(|X — m|;-) <—-^+P(|m — m| > - 


u 


(7.26) 


Combining (7.25) and (7.26), we have 


> u 


<p 


Q( \ \Xt - m\ 


t=i 2 

Now applying Inequality (7.24), we have 


Q[ \X-m\;- 


u 


u 


> — 1 + 2P \m — m\> - 


(7.27) 


P IQ |Xi-m| 


t=i 2 


-Q |X-m|;- I > - 


u 


< exp ( —'ll) 




rju 


<exp +exp -V; V 


1 1 
2 ~ TJ 

TjU 


+ exp —'ll; 


^--FAF, 




(7.28) 


whenever p 2 {F 2 ^(1/2) + u/2} — 1/2 > 1/T. Here the last inequality is due to Assumption 
(A9) and the fact that V’ is non-decreasing. Similarly, we also have 


u 


P m — m > — 
I 2 


< exp ( —'ll) 


FAF^-^-.)+^ 


rju 


u 


< exp<j y ) [> + exp<j -Ip ( — 


^ + exp ( -ip 


rju 




(7.29) 


whenever Fi{Fpp^{l/2) +m/ 2} — 1/2 > 1/T. Again the last inequality is due to Assumption 
(A9) and the fact that / is nondecreasing. Here we recall that Fi and F 2 are the distribution 
functions of X and |X — Q(X;1/2)|. Combining Inequalities (7.27), (7.28), and (7.29), we 
have 


P 


-(Jm{X) > <3exp|-'0(^Y ~ t)} 

<6exp{-^(|^-i)}, 
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whenever we have Q < u/2 < k and riu/2 > 1/T. Now we switch the focus back to the entire 
matrix R. By the sub-additivity of probability measures, we have 

d 


P IIR-RII 


> M < 


5;P{|SM({Vyy.,) -aM(.W) 

i=i 

<6exp|21ogd- 


> u 


(7.30) 


We recall that by the dehnition of the function V’(')) have 




m-i)'- 


To simplify the denominator on the right-hand side of the above equation, we require that 


t /"nu X \ (2a+3)/(a+2) 

Cl > \2~f) 


(7.31) 


Then we have '0(r7M/2 — 1/T) > T/{2Ci){riu/2 — 1/T)^. Plugging this into (7.30), we obtain 


P IIR-RII 


>u) < 6exp<{21ogd- 


(7.32) 


Next we select a proper u to derive the rate of convergence. To this end, we set 

2\ogd 

This leads to 




2Ci V 2 


2 

u = - 
V 


6Ci log d 1 
T ^ T 


(7.33) 


Plugging the above equation into (7.31), we get 




Thus, (7.31) holds as long as we have logd = By Assumption (A7), (7.31) 

holds. Plugging (7.33) into (7.32), we get 


2 

max ^ 

V 


P|||R-R 

Thus, as T and d both go to inhnity, we have 

11R R-11 max Op 

This completes the proof. 


6Ci log d 


T 


+ y ) K -• 


logd 


□ 
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Lemma 7.9. Let be a d dimensional stationary process satisfying Assumptions 

(A6) - (A9). We then have 


I|6-d|U„ = Op(^), 
where D is dehned in Eqnation (2.13). 

Proof. Dehne R = diag(aM,i, • • •, R = diag{aM(Wii),..., crM(Wirf)}, = ai/aM,i, 

and = ■\/Sii/(Tm(Wii). We have 


D 


D 


max — 


< 


IC^'R - C^RlUax < ||C^(R - R)|Uax + ||(c“ - C^)R 
c^|||R-R|Uax + C'|c^-c^|. 


(7.34) 


By Lemma 7.8, we have 


|R — Rllmax — Op 


\ogd 


Thns, specihcally, we have 


We can rewrite af as 


O'M,! —^ CrM(Wii). 




t=i 


T(T- 1) 


— 53/!(.¥„, AVi), 


t<t' 


(7.35) 


(7.36) 


where := Ylt=i^tj/T, and h{Xti,Xtii) = {Xu — Xf/iY/2. Thus, af is a [/-statistic 
with kernel function h. Using Lemma 7.2 with Assumptions (A6) and (A8), we have 
■\/T(a^ — Sii) Zi where Zi is a Gaussian random variable with mean 0. Using the delta 
method, we have \/T{di — a/Sh) Z 2 for another mean 0 Gaussian random variable Z 2 . 
Gombining this with (7.36) and applying Slutsky’s theorem, we have a/T(c?^ — c'^) Z^ for 
some Gaussian random variable Zs. Thus, we have 


|c^-c^| = Op(l/yT). 


(7.37) 


Gombining (7.34), (7.35), and (7.37), we have the desired result. 


□ 
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7.2 Proof of Theorem 3.1 


Proof. Denote a = Dw. Using Taylor expansion entry-wise on sin(7rT/2) at sin(7rT/2), we 
have 

w"''(S — S)w = a'''|sin(^T) — sin(^T)|a 
= a-r{cos(|T) o yf - T)}o + o^{-i|sm(«,^)| o (|)2(f - T) o (f - T)}o, 

■V" “V* 

Ai A2 

where for each j, k E {I,..., d}, 9jk lies between Tjk and fjk- Using Lemma 7.6 and assnmp- 
tion (A4), we have 

A2< yl|o|i;||T-T£„ = Op(‘^) =Op(^). (7.38) 

Here the first ineqnality is dne to the fact that for any vectors vi,V 2 G and matrix 

M G 


|v[Mv2| < ||vi||i||Mv2||oo < ||M||max||vi||l||v 2 ||l. (7.39) 

Next, we focus on Ai. We can expand Ai by 

Ai = E -R*') ° |T}a, (7.40) 

^ 

Ut 


where g{-) is defined in Equation (3.1). Note that Ut is a U-statistic of order 2 and the 
kernel function g{-) satisfying 




71 

< — max 

2 jk 


sign(i?y - Rtij)Ag\i{Rtk - Rfk) 



TT 

< - 
- 2 


IDI 


w 



Thus g{-) is a bounded kernel function. Assumption (A3) guarantees that {Rt}t£Z is also 
/3-mixing with /3{n) < Thus, by Lemma 7.2, we have 


VtAi _ Vt{Ut - 9) d, 
a a 


(7.41) 


where Z ~ A(0,1) is a standard Gaussian random variable. By Slutsky’s theorem, combining 
the above equation with (7.38) leads to the desired result. □ 


37 














7.3 Proof of Theorem 3.2 

Proof. Similar to the proof of Theorem 3.1, we can expand w'''(S* — S)w by 


wT(S*-S)w=aT^cos(:^T)oT(T*-T)U+aT^-^[sin(0,fc)]o(T)^(T*-T)o(T*-T)U.(7.42) 


71 


1 , 


,vr. 


A? 




Let R* := w^(S* — S)w and rewrite Al as 


= T{T- 1) o >a 


Remind that g{-) is a bonnded kernel fnnction and Assnmption (A3) implies that the process 
tgz is /3-mixing with /3{n) < n ^ By Lemma 7.4 and Assnmption (A2), we then have 

VaT*{VfUf) - VaiiVf Ut) = op{a‘^), 

where Up is dehned in Eqnation (7.40). Moreover, by (7.41), we have Yax{\/TUT) = cr^{l -|- 
o(l)}. Thns, we have 


Var*(VTA*) = Var*(VTf/*) = ^^{1 + op(l)}. 


(7.43) 


Next, we focns on the asymptotics of Ybx*{-\/TA f). Noting that by (7.39), we have 

^ 2 <yll«llil|T*-T||Lx- 

By the circnlar block bootstrap procednre, the process is still a 0-mixing process 

with mixing coefficient (j){n) < for some 62 > 0 as long as e > 

eo/(l — eo). Thns, by Lemma 7.6, we have ||T* — T||max = Op{^^/\ogd/T). Thns, we have 
A 2 = Op(logd/T) and accordingly 

Var*(VTA;) < TE*{Af) = | _ op{a^), (7.44) 

where E* is the bootstrap expectation conditional on Combining Eqnations (7.42), 

(7.43), and (7.44), we have 

Va.T*{VfR*) = Var*|yT(A* + A*)| = Var*(yTA*) + Var*(VTA;) + 2 Cov(VtA0 VtA;) 

< VaPiVfAl) + Ysx*{VfAl) + 2 ^Var*( VfA\) ^Var*(V tA*) 

= cr2{l + op(l)}. (7.45) 
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On the other hand, we also have 


VaT*{VfR*) > VaT*{VfAl) + VaT*{VfAl) - 2Var*( VfAl )Var*( VtA ^) 

= a2{l + op(l)}. (7.46) 


Combining (7.45) and (7.46) completes the proof. 


□ 


7.4 Proof of Theorem 3.5 


Proof. Denote a := Dw. We can write 

f")Dw|. (7.47) 

--' '-V-^ 

Bi B2 


,7r 


,7r 


w' (S"-S)w=a' sin(-TO-sin(-T) 'ta+iw ' D sin(-TODw-w' D sin(- 




,7r 


By the same arguments as in the proof of Theorem 3.1, we have 

4 Z, (7.48) 

a 

where Z ~ iV(0, 1) is a Gaussian random variable. It remains to show that B 2 is ignorable 
asymptotically. Using (7.39), we have 


B2 


< 




wTDsin(-T")(D-D)w 


w 


(D 


< II sin |T||max||(D - D)w||i(||Dw||i + 

^ IID D IIniax( IIDII jnax T ||D||jnax)- 


— D) sin(^T'^)Dw 
||Dw||i) 


Using Lemma 7.9 and Assumption (A5), we have |i? 2 | = Op{^/\ogd/T) = op{a/^/Ys). 
Together with (7.47) and (7.48), using Slutsky’s theorem, we have the desired result. □ 


7.5 Proof of Corollary 3.1 

Proof. By (3.5), we have F{\wj/wj — 1| > t) < exp(—CTt^). Thus, we further have 

P(max \wj/wj — 1| > t) < dF{\wj/wj — 1| > t) < exp(log(i — CTt^). 
j 

To simplify the rate of convergence, setting t = (3 log d)/{CT), we have 

P^max \wj/wj — 1| > 43 log d/{CT)^ < 1/d. 
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Thus, as (T, h) go to infinity, we have maxj \wj/wj — 1| = Op{>y\og d/T). This gives us an 
upper bound of the convergence rate of ||w — w||i: 


|w-w||i = ^|u;j-u;j| = ^ 


W4 


Wi 


< IIwill ■ max 


Wi 


= 0f 


\ogd\ 


i=i i=i 

Similar as in (7.47), we can decompose w'''S®w — w'''Sw into 

w"''S®w — w'''Sw = Bi + w'''D sin(—T^)D w — w'''D sin(—T®)D w, 


j. (7.49) 


(7.50) 


— 

Bs 


where Bi is defined in (7.47). As in the proof of Theorem 3.5, we still have (7.48). Regarding 
Rs, we have IR 3 I < ||Dw — Dw||i||Dw + Dw||i. Using the triangle inequality, we have 


- 83 1 < (l|D(w - w)||i + ||(D - D)w||i) (||D||i||w||i + ||D||i||w 


< ||D|Lax||w - will + ||D - D| 


|D| 


|w||i + ||D| 


Using (7.49) and Lemma 7.9, we can conclude IR 3 I = Op{^^/\ogd/T). Plugging it into (7.50) 
and using the Slutsky’s theorem, we have the desired result. □ 


7.6 Proof of Theorem 3.6 

Proof. Let K'^* = sin(7rT^*/2) and K = sin(7rT/2). We can decompose R* := w'''DK®*Dw 
into two parts: 

R* = w'^DK**Dw - w^DKDw + w'^DK'’‘Dw - w^DK^’^Dw . (7.51) 

B* B^ 

By similar arguments as in the proof of Theorem 3.2, we have 

Var*( V^R*) = a2{l + op(l)}. (7.52) 

Next, we show that Ysx*{^/T~sBf) = op{(t‘^). We can upper bound Var*(R 2 ) by 

Var*(R 2 *) = Var*|w'^DK*‘(D - D)w + w'^(D - D)K"*Dw| 

< Var*|w'^DK*‘(D - D)w| + Var*|w'^(D - D)K**Dw| 

+ 2W Var*|wTDK"*(D - D)w| W Var*|wT(D - D)K**Dw|. (7.53) 
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For any random matrix X := (i?i,..., RmY € fixed vectors Vi G V 2 G M”, 

let V be a matrix with (j, k) entry vj Cov{Rj, i?fc)v 2 . It is easy to verify that 


Var(v]'Xv2) =v]' Var(Xv2)vi 


v^Vvi < 


Vi 1 max 

jk 


vj Cov(i^J,i^fc)v 2 


<l|vi||?||v 2 ||i max \Cov{Rji^ki,Rj 2 ,k 2 )\- 

jlM,32,k2 


(7.54) 


Now writing Vi = Dw, V 2 = (D — D)w, and X = K^*, we have 


Var- w^K* (D - D)w <||Dw||)||(D - D)w||^ max | Cov(g,,,„ g,, J| 

I- J 3l,ki,J2,k2 


<l|w||f(|D||J,„(|D-D 


|2 

I max 


IDI 


ID-DI 


(7.55) 


Note that D only depends on {Rt}J^i and is thus hxed under Var*(-). Using Lemma 7.9 
and (7.55), we have 


Var^ 


Similarly, we also have 


T^w^DK" (D - D)w \ = Op (Ts 


V&T* 


^ ogd \ 

T ) 


= Of 


logd 

'IpS 


= op{a^). 


TV (D - D)K" Dw \ = opfaO. 


(7.56) 


(7.57) 


Combining (7.53), (7.56), and (7.57), we have 


Var*( Vt*) =op( 


a 


(7.58) 


By (7.51), we have 

Var*( Vt*) > Var*( V^t) + Var*(^^ 2 ) “ 2^Var*( VT*)yVar*( Vt*), 

and similarly 

Var*( < Var*( V^t) + Var*( ^^ 3 *) + 2 ^Var* ( ^Var* (V^ 2 ) • 

Using the above two inequalities with (7.52) and (7.58), we can conclude that Yax*{y/TsR*) = 

cr2{l + op(l)}. □ 

7.7 Proofs of Theorems 3.3 and 3.4 

The proofs of Theorems 3.3 and 3.4 are close to those of Theorems 3.5 and 3.6. The main 
difference is that now plays the role of T, and T plays the role of T^. We accordingly 
omit the proofs. 
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